首页> 外文OA文献 >Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark

【2h】

Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark

机译：贝叶斯网络学习中可扩展的精确父集识别使用apache spark

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

In Machine Learning, the parent set identification problem is to find a setof random variables that best explain selected variable given the data and somepredefined scoring function. This problem is a critical component to structurelearning of Bayesian networks and Markov blankets discovery, and thus has manypractical applications, ranging from fraud detection to clinical decisionsupport. In this paper, we introduce a new distributed memory approach to theexact parent sets assignment problem. To achieve scalability, we derivetheoretical bounds to constraint the search space when MDL scoring function isused, and we reorganize the underlying dynamic programming such that thecomputational density is increased and fine-grain synchronization iseliminated. We then design efficient realization of our approach in the ApacheSpark platform. Through experimental results, we demonstrate that the methodmaintains strong scalability on a 500-core standalone Spark cluster, and it canbe used to efficiently process data sets with 70 variables, far beyond thereach of the currently available solutions.

机译：在机器学习中，父集标识问题是找到一组随机变量，这些变量可以最好地解释给定数据和某些预定义评分功能的所选变量。这个问题是贝叶斯网络和马尔可夫毯发现的结构学习的关键组成部分，因此具有许多实际应用，从欺诈检测到临床决策支持。在本文中，我们为精确的父集分配问题引入了一种新的分布式存储方法。为了实现可伸缩性，我们推导了使用MDL评分功能时限制搜索空间的理论界限，并重新组织了底层动态编程，从而提高了计算密度并消除了细粒度同步。然后，我们在ApacheSpark平台中设计我们方法的有效实现。通过实验结果，我们证明了该方法在500核独立Spark集群上保持强大的可伸缩性，并且可以有效地处理具有70个变量的数据集，远远超出了当前可用解决方案的范围。

著录项

作者
Karan, Subhadeep; Zola, Jaroslaw;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Learning distributed discrete Bayesian Network Classifiers under MapReduce with Apache Spark [J] . Arias Jacinto, Gamez Jose A., Puerta Jose M. Knowledge-Based Systems . 2017,第FEBa期

机译：使用Apache Spark在MapReduce下学习分布式离散贝叶斯网络分类器
2. Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics [J] . Lunga Dalton, Gerrand Jonathan, Yang Lexie, Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of . 2020,第期

机译：Apache Spark加速了大规模卫星图像分析的深度学习推断
3. Scaling machine learning for target prediction in drug discovery using Apache Spark [J] . Dries Harnie, Mathijs Saey, Alexander E. Vapirev, Future generation computer systems . 2017,第feba期

机译：使用Apache Spark在药物发现中扩展机器学习以进行目标预测
4. Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark [C] . Subhadeep Karan, Jaroslaw Zola 2017 IEEE 24th International Conference on High Performance Computing . 2017

机译：贝叶斯网络中使用Apache Spark学习的可扩展精确父集标识
5. GeoSparkSim: A Scalable Microscopic Road Network Traffic Simulator Based on Apache Spark [D] . Fu, Zishan 2019

机译：GeoSparkSim：基于Apache Spark的可扩展的微观道路网络交通模拟器
6. SPARK-MSNA: Efficient algorithm on Apache Spark for aligning multiple similar DNA/RNA sequences with supervised learning [O] . V. Vineetha, C. L. Biji, Achuthsankar S. Nair -1

机译：SPARK-MSNA：Apache Spark上的高效算法可通过监督学习将多个相似的DNA / RNA序列比对
7. Approximate Learning of High Dimensional Bayesian Network Structures via Pruning of Candidate Parent Sets [O] . Zhigao Guo, Anthony C. Constantinou 2020

机译：通过候选父集的修剪大致学习高维贝叶斯网络结构

Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark

摘要

著录项

相似文献

相关主题

期刊订阅